21 research outputs found

    An Evolutionary Optimization Approach for Categorical Data Protection

    Get PDF
    The continuous growing amount of public sensible data has increased the risk of breaking the privacy of people or institutions in those datasets. Many protection methods have been developed to solve this problem by either distorting or generalizing data but taking into account the difficult tradeoff between data utility (information loss) and protection against disclosure (disclosure risk). In this paper we present an optimization approach for data protection based on an evolutionary algorithm which is guided by a combination of information loss and disclosure risk measures. In this way, state-of-the-art protection methods are combined to obtain new data protections with a better trade-off between these two measures. The paper presents several experimental results that assess the performance of our approach

    An evolutionary algorithm to enhance multivariate Post-Randomization Method (PRAM) protections

    Get PDF
    The amount of public statistical information available is growing and more accurate protection methods are needed in order to achieve data confidentiality. The Post-Randomization Method (PRAM) protection method was introduced in 1997 as a very powerful method for categorical microdata, but it is still not widely used. This method has a Markov matrix as a parameter. The main problem of the application of this method is that it is difficult to find a good Markov matrix that performs changes in the microdata file producing low loss of valuable information and low risk of disclosure of sensitive data. In this paper we present a methodology that helps us to find a matrix to perform better protections. This is achieved by using an evolutionary algorithm with integrated Information Loss and Disclosure Risk measures. Experiments using three different datasets are also presented in order to empirically evaluate the application of this technique. © 2014 Elsevier Inc. All rights reserved.This work has been done under the PhD in Computer Science program of the Universitat Autònoma de Barcelona (UAB). It is also partially supported by the Spanish MEC ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, and COPRIVACY TIN2011-27076-C03-03. The research leading to these results has also received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement Num. 262608.Peer Reviewe

    Protecció de dades categòriques i anàlisi de la pèrdua d'informació

    Get PDF
    Degut a l'expansió de la nostra societat cada dia hi ha més fonts de dades públiques (mèdiques, financeres,...) per a realitzar-hi estudis estadístics. Aquestes fonts de dades són perilloses per a la informació confidencial de les persones o institucions ja que són accessibles per a tothom, per tant necessiten ser protegides abans de ser publicades. En aquest projecte es presenten els diferents mètodes de protecció corresponents a dades categòriques així com un anàlisi de cadascun per a determinar-ne la pèrdua d'informació i el risc de revelació. Finalment també s'ha desenvolupat un mètode per optimitzar els resultats obtinguts pel mètode PRAM.Debido a la expansión de nuestra sociedad, cada vez hay mas fuentes de datos públicos (medicas, financieras,...) para realizar estudios estadísticos. Éstas fuentes de datos son peligrosas para la información confidencial de las personas o instituciones ya que son accesibles por todo el mundo, por eso deben ser protegidas antes de ser publicadas. En este proyecto se presentan los diferentes métodos de protección correspondientes a datos categóricos así como un análisis de cada uno para determinar su pérdida de información y su riesgo de revelación. Finalmente también se ha desarrollado un método para optimizar los resultados del método PRAM.Due to the expansion of our society, every day appear more and more public data sources (medical, financial,...) for statistical studies. Those data sources are dangerous for confidential information about persons or institutions because they are accessible for everybody, so they need to be protected before publish. This project presents the different protection methods for categorical data as well as the analysis of everyone to determine their information loss and disclosure risk. Finally a new method for the optimization of PRAM results has developed

    Desarrollo integral de las competencias genéricas mediante mapas competenciales

    Get PDF
    Los planes de estudio del EEES deben diseñarse a partir de las competencias de la titulación, tanto específicas como genéricas. La universidad española tiene una amplia experiencia en trabajar y evaluar las competencias específicas, pero las competencias genéricas suponen un nuevo reto que es preciso abordar. En este trabajo se hace una propuesta sobre cómo trabajar y evaluar, de forma global, las competencias genéricas en una titulación de Grado. La propuesta se está implantando en los estudios de Grado en Ingeniería Informática de la Facultat d’Informàtica de Barcelona. En lugar de establecer diversos niveles de competencia y asignar cada uno de estos niveles a distintas asignaturas, como suele hacerse con las competencias específicas usando la taxonomía de Bloom, se propone definir cada competencia genérica en términos de dimensiones. Cada una de las dimensiones (aspectos de la competencia) se define en términos de objetivos a tres niveles, y son los objetivos de un determinado nivel de cada dimensión lo que se encarga a las asignaturas. De esta forma, una misma asignatura puede trabajar distintas dimensiones de una competencia genérica, cada una de ellas a un nivel diferente. Diferentes competencias pueden compartir un subconjunto de dimensiones. Evitar repetir el trabajo de estas dimensiones en diferentes asignaturas cuando no es estrictamente necesario permite optimizar el trabajo realizado y favorece que los estudiantes adquieran las competencias genéricas definidas por la titulación.In the context of the European Higher Education Area (EHEA), curriculum design needs to be based on the particular degree programme competencies, including both domain-specific and generic competencies. Although Spanish universities already have a wide experience in developing and assessing domain-specific competencies, generic competencies pose a new challenge that we need to face. The present work proposes a model to globally develop and assess generic competencies in the Bachelor’s Degree in Informatics Engineering at Barcelona School of Informatics. A common procedure to develop domain-specific competencies consists in setting different competency levels (based on Bloom’s taxonomy) and then assigning them to the corresponding subjects or courses in the programme. Instead, in order to develop generic competencies into a comprehensive integrated experience, we propose a definition of each competency in terms of dimensions (or competency aspects), which are further defined according to three-level objectives. These objectives are integrated into the subjects that are considered suitable for this purpose. Thus, one subject may integrate dimensions belonging to different competencies at different levels, which contributes to an integral educational experience. In the process of designing our global map of competency dimensions, we have found that some competencies may share some subset of those dimensions, which calls for workload optimization. This global map allows us to refine the process of assigning competency objectives to subjects, and although recurrent practice may be appropriate in the development of competencies in general, we can avoid redundancy when necessary. Thus, this procedure helps us to integrate objectives into the corresponding subjects most effectively, helping students develop the generic competencies defined in the degree programme

    Protecció de dades categòriques i anàlisi de la pèrdua d'informació

    No full text
    Degut a l'expansió de la nostra societat cada dia hi ha més fonts de dades públiques (mèdiques, financeres,...) per a realitzar-hi estudis estadístics. Aquestes fonts de dades són perilloses per a la informació confidencial de les persones o institucions ja que són accessibles per a tothom, per tant necessiten ser protegides abans de ser publicades. En aquest projecte es presenten els diferents mètodes de protecció corresponents a dades categòriques així com un anàlisi de cadascun per a determinar-ne la pèrdua d'informació i el risc de revelació. Finalment també s'ha desenvolupat un mètode per optimitzar els resultats obtinguts pel mètode PRAM.Debido a la expansión de nuestra sociedad, cada vez hay mas fuentes de datos públicos (medicas, financieras,...) para realizar estudios estadísticos. Éstas fuentes de datos son peligrosas para la información confidencial de las personas o instituciones ya que son accesibles por todo el mundo, por eso deben ser protegidas antes de ser publicadas. En este proyecto se presentan los diferentes métodos de protección correspondientes a datos categóricos así como un análisis de cada uno para determinar su pérdida de información y su riesgo de revelación. Finalmente también se ha desarrollado un método para optimizar los resultados del método PRAM.Due to the expansion of our society, every day appear more and more public data sources (medical, financial,...) for statistical studies. Those data sources are dangerous for confidential information about persons or institutions because they are accessible for everybody, so they need to be protected before publish. This project presents the different protection methods for categorical data as well as the analysis of everyone to determine their information loss and disclosure risk. Finally a new method for the optimization of PRAM results has developed

    Categorical Data Protection on Statistical Datasets and Social Networks

    No full text
    L’augment continu de la publicació de dades amb contingut sensible ha incrementat el risc de violar la privacitat de les persones i/o institucions. Actualment aquest augment és cada cop mes ràpid degut a la gran expansió d’Internet. Aquest aspecte fa molt important la comprovació del rendiment dels mètodes de protecció utilitzats. Per tal de fer aquestes comprovacions existeixen dos tipus de mesures a tenir en compte: la pèrdua d’informació i el risc de revelació. Una altra àrea on la privacitat ha incrementat el seu rol n’és el de les xarxes socials. Les xarxes socials han esdevingut un ingredient essencial en la comunicació entre persones en l’actual món modern. Permeten als usuaris expressar i compartir els seus interessos i comentar els esdeveniments diaris amb tota la gent amb la qual estan connectats. Així doncs, el ràpid augment de la popularitat de les xarxes socials ha resultat en l’adopció d’aquestes com a àrea d’interès per a comunitats específiques. No obstant, el volum de dades compartides pot ser molt perillós en termes de privacitat. A més de la informació explícita compartida mitjanant els ”posts” de cada usuari, existeix informació semàntica implícita amagada en el conjunt de d’informació compartida per cada usuari. Per aquestes i altres raons, la protecció de les dades pertanyents a cada usuari ha de ser tractada. Així doncs, les principals contribucions d’aquesta tesi són: • El desenvolupament de mètodes de protecció basats en algorismes evolutius els quals busquen de manera automatitzada millors proteccions en termes de pèrdua d’informació i risc de revelació. • El desenvolupament d’un mètode evolutiu per tal d’optimitzar la matriu de probabilitats de transició amb la qual es basa el mètode Post- Randomization Method per tal de generar proteccions millors. • La definició d’un mètode de protecció per a dades categ`oriques basat en l’execució d’un algorisme de clustering abans de protegir per tal d’obtenir dades protegides amb millor utilitat. • La definició de com es pot extreure tant informació implícita com explicita d’una xarxa social real com Twitter, el desenvolupament d’un mètode de protecció per xarxes socials i la definició de noves mesures per avaluar la qualitat de les proteccions en aquests escenaris.The continuous growth of public sensitive data has increased the risk of breaking the privacy of people or institutions in those datasets. This growing is, nowadays, even faster because of the expansion of the Internet. This fact makes very important the assessment of the performance of all the methods used to protect those datasets. In order to check the performance there exist two kind of measures: the information loss and the disclosure risk. Another area where privacy has an increasing role is the one of social networks. They have become an essential ingredient of interpersonal communication in the modern world. They enable users to express and share common interests, comment upon everyday events with all the people with whom they are connected. Indeed, the growth of social media has been rapid and has resulted in the adoption of social networks to meet specific communities of interest.However, this shared information space can prove to be dangerous in respect of user privacy issues. In addition to explicit ”posts” there is much implicit semantic information that is not explicitly given in the posts that the user shares. For these and other reasons, the protection of information pertaining to each user needs to be supported. This thesis shows some new approaches to face these problems. The main contributions are: • The development of an approach for protecting microdata datasets based on evolutionary algorithms which seeks automatically for better protections in terms of information loss and disclosure risk. • The development of an evolutionary approach to optimize the transition matrices used in the Post-Randomization masking method which performs better protections. • The definition of an approach to deal with categorical microdata protection based on a pre-clustering approach achieving protected data with better utility. • The definition of a way to extract both implicit and explicit information from a real social network like Twitter as well as the development of a protection method to deal with this information and some new measures to evaluate the protection quality

    On the Protection of Social Network-Extracted Categorical Microdata

    No full text
    Social networks have become an essential part of the people’s com- munication system. They allow the users to express and share all the things they like with all the people they are connected with. However, this shared information can be dangerous for their privacy issues. In addition, there is some information that is not explicitly given but is implicit in the text of the posts that the user shares. For that reason, the information of each user needs to be protected. In this paper we present how implicit information can be extracted from the shared posts and how can we build a microdata dataset from a social network graph. Furthermore, we protect this dataset in order to make the users data more private.Peer reviewe

    Data privacy using an evolutionary algorithm for invariant PRAM matrices

    No full text
    Dissemination of data with sensitive information has an implicit risk of unauthorized disclosure. Several masking methods have been developed in order to protect the data without the loss of too much information. One such method is the Post Randomization Method (PRAM) based on perturbations of a categorical variable according to a Markov probability transition matrix. The method has the drawback that it is difficult to find an optimal transition matrix to perform perturbations and maximize data utility. An evolutionary algorithm which generates an optimal probability transition matrix is proposed. Optimality is with respect to a pre-defined fitness function dependent on the aspects of the data that need to be preserved following perturbation. The algorithm embeds two properties: the invariance of the transition matrix to preserve marginal totals in expectation, and the control of diagonal probabilities which determine the amount of perturbation. Experimental results using a real data set are presented in order to illustrate and empirically evaluate the application of this algorithm. © 2014 Elsevier Ireland Ltd. All rights reserved.This work has been carried out under the Ph.D. in Computer Science program of the Universitat Autònoma de Barcelona (UAB). It is also partially supported by the Spanish MECARES-CONSOLIDER INGENIO2010 CSD2007-00004, and COPRIVACY TIN2011-27076-03-03. The research was also funded by the European Union’s Seventh Framework infrastructure research grant:262608, Data Without Boundaries (DwB).Peer Reviewe

    PRAM Optimization Using an Evolutionary Algorithm

    No full text
    PRAM (Post Randomization Method) was introduced in 1997 but it is still one of the least used methods in statistical categorical data protection. This fact is because of the difficulty to obtain a good transition matrix in order to obtain a good protection. In this paper, we describe how to obtain a better protection using an evolutionary algorithm with integrated information loss and disclosure risk measures to find the best matrix. We also provide experiments using a real dataset of 1000 records in order to empirically evaluate the application of this technique
    corecore